Fujitsu Laboratories Trec8 Report 1 System Description 1.0.1 Tera 2 Common Processing
نویسندگان
چکیده
This year a Fujitsu Laboratory team participated in three tracks:that is ad hoc, small web track, and large web track. As basic techiniques, we compared four popular stemmers, and we made simple removing stop pattern techniques for TREC queries. For the ad hoc task, and small web track, we used the same techiniques. We experimented with area weighting, co-occurence boosting, bi-gram utlization, and reranking by bi-gram extraction from pilot search. The e ect of blind application with those techiniques is rather limited, or even uncertain in the TREC8 experiment. What we can say from TREC8 result is that blind application of co-occurence boosting and area weighting may be e ective for the small web track. They requerie query dependent application. In the large web track, our main interest is efciency, that is how much resources are required to process 100GB of web text and 10000 real web queries in practical time. Using a statistical based language type checker, we can eliminate 23% of nonEnglish text. This leads to speeding up a indexing and reducing the index size. The search speed for an inverted le is CPU intensive if the target machine has main memory in excess of 10-25% of the index size. So with simple, but e ective index compression methods, the throughput of query processing is about 0.54-1.1 query/second even by a single 300MHz Ultra-sparc processor. 1 System Description
منابع مشابه
Fujitsu Laboratories Trec9 Report 1 System Description 2 Common Processing 2.1 Indexing/query Processing 2.1.1 Indexing Vocabulary 2.1.2 Stemmer 2.1.4 Stop Word List for Query Processing
This year a Fujitsu Laboratory team participated in web tracks. For TREC9 we experimented passage retrieval which is expected to be e ective for Web pages which contain more than one topic. To split document into passages, we used NLP based paragrah detecting program, not by xed (variable) window size. But it did not produce better result for TREC9 Web data. For indexing large web data faster, ...
متن کاملFujitsu Laboratories TREC2001 Report
This year a Fujitsu Laboratory team participated in web tracks. Both for ad hoc task, and entry point search task, we combined the score of normal ranking search and that of page ranking techniques. For ad hoc style task, the eect of page ranking was very limitted. We only got very little improvement for title eld search, and the page rank was not eective for description, and narrative eld sear...
متن کاملFujitsu Laboratories TREC8 Report - Ad hoc, Small Web, and Large Web Track
This year a Fujitsu Laboratory team participated in three tracks:that is ad hoc, small web track, and large web track. As basic techiniques, we compared four popular stemmers, and we made simple removing stop pattern techniques for TREC queries. For the ad hoc task, and small web track, we used the same techiniques. We experimented with area weighting, co-occurence boosting, bi-gram utlization,...
متن کاملFujitsu Laboratories Trec7 Report 2 System Description 2.1 Overall 2.2 the Search System Tera
In our rst participation in TREC, our focus was on improving the basic ranking systems and applying text clustering techniques for query expansion. We tested a variety of techiniques including reference measures, passage retrieval, and data fusion for the basic ranking systems. Some techiniques were used in the o cial run, others were not used because of time limitations. We applied the text cl...
متن کاملControlling Polarization in Quantum-dot Semiconductor Optical Amplifiers
1 Fujitsu Limited and Optoelectronic Industry and Technology Development Association 2 Institute for Nano Quantum Information Electronics (INQIE), The University of Tokyo 3 Fujitsu Limited and Optoelectronic Industry and Technology Development Association 4 Fujitsu Laboratories Limited 5 Department of Electrical and Electronics Engineering, Facility of Engineering, Kobe University 6 Department ...
متن کامل